Goto

Collaborating Authors

 automated generation


Automated Generation of Custom MedDRA Queries Using SafeTerm Medical Map

arXiv.org Artificial Intelligence

In pre-market drug safety review, grouping related adverse event terms into standardised MedDRA queries or the FDA Office of New Drugs Custom Medical Queries (OCMQs) is critical for signal detection. We present a novel quantitative artificial intelligence system that understands and processes medical terminology and automatically retrieves relevant MedDRA Preferred Terms (PTs) for a given input query, ranking them by a relevance score using multi-criteria statistical methods. The system (SafeTerm) embeds medical query terms and MedDRA PTs in a multidimensional vector space, then applies cosine similarity and extreme-value clustering to generate a ranked list of PTs. Validation was conducted against the FDA OCMQ v3.0 (104 queries), restricted to valid MedDRA PTs. Precision, recall and F1 were computed across similarity-thresholds. High recall (>95%) is achieved at moderate thresholds. Higher thresholds improve precision (up to 86%). The optimal threshold (~0.70 - 0.75) yielded recall ~50% and precision ~33%. Narrow-term PT subsets performed similarly but required slightly higher similarity thresholds. The SafeTerm AI-driven system provides a viable supplementary method for automated MedDRA query generation. A similarity threshold of ~0.60 is recommended initially, with increased thresholds for refined term selection.


LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering

arXiv.org Artificial Intelligence

Aerospace manufacturing demands exceptionally high precision in technical parameters. The remarkable performance of Large Language Models (LLMs), such as GPT-4 and QWen, in Natural Language Processing has sparked industry interest in their application to tasks including process design, material selection, and tool information retrieval. However, LLMs are prone to generating "hallucinations" in specialized domains, producing inaccurate or false information that poses significant risks to the quality of aerospace products and flight safety. This paper introduces a set of evaluation metrics tailored for LLMs in aerospace manufacturing, aiming to assess their accuracy by analyzing their performance in answering questions grounded in professional knowledge. Firstly, key information is extracted through in-depth textual analysis of classic aerospace manufacturing textbooks and guidelines. Subsequently, utilizing LLM generation techniques, we meticulously construct multiple-choice questions with multiple correct answers of varying difficulty. Following this, different LLM models are employed to answer these questions, and their accuracy is recorded. Experimental results demonstrate that the capabilities of LLMs in aerospace professional knowledge are in urgent need of improvement. This study provides a theoretical foundation and practical guidance for the application of LLMs in aerospace manufacturing, addressing a critical gap in the field.


BugSpotter: Automated Generation of Code Debugging Exercises

arXiv.org Artificial Intelligence

Debugging is an essential skill when learning to program, yet its instruction and emphasis often vary widely across introductory courses. In the era of code-generating large language models (LLMs), the ability for students to reason about code and identify errors is increasingly important. However, students frequently resort to trial-and-error methods to resolve bugs without fully understanding the underlying issues. Developing the ability to identify and hypothesize the cause of bugs is crucial but can be time-consuming to teach effectively through traditional means. This paper introduces BugSpotter, an innovative tool that leverages an LLM to generate buggy code from a problem description and verify the synthesized bugs via a test suite. Students interact with BugSpotter by designing failing test cases, where the buggy code's output differs from the expected result as defined by the problem specification. This not only provides opportunities for students to enhance their debugging skills, but also to practice reading and understanding problem specifications. We deployed BugSpotter in a large classroom setting and compared the debugging exercises it generated to exercises hand-crafted by an instructor for the same problems. We found that the LLM-generated exercises produced by BugSpotter varied in difficulty and were well-matched to the problem specifications. Importantly, the LLM-generated exercises were comparable to those manually created by instructors with respect to student performance, suggesting that BugSpotter could be an effective and efficient aid for learning debugging.


AutoSpec: Automated Generation of Neural Network Specifications

arXiv.org Artificial Intelligence

Each specification defines the expected model output for a given input space ( 2.1). The increasing adoption of neural networks in learning-augmented systems highlights the importance Specifically, researchers have relied on their domain knowledge of model safety and robustness, particularly and intuition about individual applications to manually in safety-critical domains. Despite progress in create specifications. For instance, in adaptive video streaming, the formal verification of neural networks, current where a neural network is employed to determine the practices require users to manually define model bitrate for the next video chunk based on recent network specifications--properties that dictate expected conditions, Eliyahu et al. (2021) define a specification as, model behavior in various scenarios. This manual "[if video] chunks were downloaded quickly (more quickly process, however, is prone to human error, limited than it takes to play a chunk), the DNN should eventually in scope, and time-consuming. In this paper, not choose the worst resolution." Similar manual specifications we introduce AutoSpec, the first framework to are devised for other learning-augmented systems, e.g., automatically generate comprehensive and accurate database indexes (Tan et al., 2021), memory allocators (Wei specifications for neural networks in learningaugmented et al., 2023), and job schedulers (Wu et al., 2022).


Model-based Workflow for the Automated Generation of PDDL Descriptions

arXiv.org Artificial Intelligence

Manually creating Planning Domain Definition Language (PDDL) descriptions is difficult, error-prone, and requires extensive expert knowledge. However, this knowledge is already embedded in engineering models and can be reused. Therefore, this contribution presents a comprehensive workflow for the automated generation of PDDL descriptions from integrated system and product models. The proposed workflow leverages Model-Based Systems Engineering (MBSE) to organize and manage system and product information, translating it automatically into PDDL syntax for planning purposes. By connecting system and product models with planning aspects, it ensures that changes in these models are quickly reflected in updated PDDL descriptions, facilitating efficient and adaptable planning processes. The workflow is validated within a use case from aircraft assembly.


PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models

arXiv.org Artificial Intelligence

In recent times, a plethora of Large Code Generation Models (LCGMs) have been proposed, showcasing significant potential in assisting developers with complex programming tasks. Benchmarking LCGMs necessitates the creation of a set of diverse programming problems, and each problem comprises the prompt (including the task description), canonical solution, and test inputs. The existing methods for constructing such a problem set can be categorized into two main types: manual methods and perturbation-based methods. However, manual methods demand high effort and lack scalability, while also risking data integrity due to LCGMs' potentially contaminated data collection, and perturbation-based approaches mainly generate semantically homogeneous problems with the same canonical solutions and introduce typos that can be easily auto-corrected by IDE, making them ineffective and unrealistic. In this work, we propose the idea of programming problem merging (PPM) and provide two implementation of this idea, we utilize our tool on two widely-used datasets and compare it against nine baseline methods using eight code generation models. The results demonstrate the effectiveness of our tool in generating more challenging, diverse, and natural programming problems, comparing to the baselines.


Discovering Generalizable Skills via Automated Generation of Diverse Tasks

arXiv.org Artificial Intelligence

The learning efficiency and generalization ability of an intelligent agent can be greatly improved by utilizing a useful set of skills. However, the design of robot skills can often be intractable in real-world applications due to the prohibitive amount of effort and expertise that it requires. In this work, we introduce Skill Learning In Diversified Environments (SLIDE), a method to discover generalizable skills via automated generation of a diverse set of tasks. As opposed to prior work on unsupervised discovery of skills which incentivizes the skills to produce different outcomes in the same environment, our method pairs each skill with a unique task produced by a trainable task generator. To encourage generalizable skills to emerge, our method trains each skill to specialize in the paired task and maximizes the diversity of the generated tasks. A task discriminator defined on the robot behaviors in the generated tasks is jointly trained to estimate the evidence lower bound of the diversity objective. The learned skills can then be composed in a hierarchical reinforcement learning algorithm to solve unseen target tasks. We demonstrate that the proposed method can effectively learn a variety of robot skills in two tabletop manipulation domains. Our results suggest that the learned skills can effectively improve the robot's performance in various unseen target tasks compared to existing reinforcement learning and skill learning methods.


Automated Generation of Test Models from Semi-Structured Requirements

arXiv.org Machine Learning

[Context:] Model-based testing is an instrument for automated generation of test cases. It requires identifying requirements in documents, understanding them syntactically and semantically, and then translating them into a test model. One light-weight language for these test models are Cause-Effect-Graphs (CEG) that can be used to derive test cases. [Problem:] The creation of test models is laborious and we lack an automated solution that covers the entire process from requirement detection to test model creation. In addition, the majority of requirements is expressed in natural language (NL), which is hard to translate to test models automatically. [Principal Idea:] We build on the fact that not all NL requirements are equally unstructured. We found that 14 % of the lines in requirements documents of our industry partner contain "pseudo-code"-like descriptions of business rules. We apply Machine Learning to identify such semi-structured requirements descriptions and propose a rule-based approach for their translation into CEGs. [Contribution:] We make three contributions: (1) an algorithm for the automatic detection of semi-structured requirements descriptions in documents, (2) an algorithm for the automatic translation of the identified requirements into a CEG and (3) a study demonstrating that our proposed solution leads to 86 % time savings for test model creation without loss of quality.


Automated Generation of Conversational Non Player Characters

AAAI Conferences

An integral part of social believability in role playing games is believability of non-player characters (NPC). In this paper we argue for the importance of believability in NPCs, even those that are completely outside of any pre-written quest or plot. We present NPCAgency, a system designed to generate many conversational NPCs as packaged narrative assets that can be shared and imported into various projects to increase story-world immersion. We believe such a system can help solve two problems. First, the authorial burden of the game designer is lessened, allowing renderings of large numbers of NPCs, each with their own unique background and conversation topics, all conforming to the norms of a predefined “universe”. Second, the immersive aspect of the game is heightened as the player can engage complex characters with lengthy dialogue affordances. We demonstrate the concept by generating fifty characters with attributes drawn from “Game of Thrones” (GOT) / “A Song of Ice and Fire” universe, and exporting them as Inform 7 code, a popular declarative interactive fiction (IF) programming language and authoring tool. A user study of thirty-seven Inform 7 programmers demonstrates that a 62% majority find the tool useful enough to consider for their own work. Further 70% said they would use the system to create “Game of Thrones” background characters for their own projects.


Automated Generation of Diverse NPC-Controlling FSMs Using Nondeterministic Planning Techniques

AAAI Conferences

We study the problem of generating a set of Finite State Machines (FSMs) modeling the behavior of multiple, distinct NPCs. We observe that nondeterministic planning techniques can be used to generate FSMs by following conventions typically used when manually creating FSMs modeling NPC behavior. We implement our ideas in DivNDP, the first algorithm for automated diverse FSM generation.